Instacart Market Basket Analysis Dataset

In this article, we work with the Instacart Market Basket Analysis Dataset from Kaggle. The dataset can downloaded from Kaggle.com or from instacart.com.

Data Description

The dataset for this competition is a relational set of files describing customers' orders over time. The goal of the competition is to predict which products will be in a user's next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We also provide the week and hour of the day the order was placed and a relative measure of time between orders. For more information, see the blog post accompanying its public release.

Data Dictionary

The data dictionary is available here.

orders (3.4m rows, 206k users):

products (50k rows):

aisles (134 rows):

deptartments (21 rows):

order_products__SET (30m+ rows):

where SET is one of the four following evaluation sets (eval_set in orders):

Exploratory Data Analysis

Most Ordered Products

Orders Distributions

Most orders take place on Mondays and Tuesdays. Besides, between 9:00 AM and 5:00 PM, most orders take place daily.

Between 9:00 AM and 5:00 PM, most orders take place on Mondays and Tuesdays.

Days after the First Order

Usually, customers order from 5 days to 15 days after the prior order. The majority of orders take place after a week and a month after the prior order.

Products Distributions

Products Distributions in Each Department

It can be seen that Personal Care and Snacks are the best seller departments.

Products Distributions in Each Aisle

Candy chocolate, chips pretzels, and cookies cakes are the best selling products.

Sale Distributions

Best Selling Departments

Best Selling Aisles


References

  1. Kaggle Dataset: Instacart Market Basket Analysis